11. Quality: Visual Assessment 2

Quality Visual Assessment 2

Quiz

Quality: Visual Assessment

Using the Jupyter Notebook below, identify the data quality issues in the following list:

SOLUTION:
  • The *given_name* for the patient with the *patient_id* 9.
  • 'u' next to the start dose and end dose in the *auralin* and *novodra* columns
  • Lowercase names in the *treatments* and *adverse_reactions* tables
  • 280 records in the *treatments* table

Workspace

This section contains either a workspace (it can be a Jupyter Notebook workspace or an online code editor work space, etc.) and it cannot be automatically downloaded to be generated here. Please access the classroom with your account and manually download the workspace to your local machine. Note that for some courses, Udacity upload the workspace files onto https://github.com/udacity , so you may be able to download them there.

Workspace Information:

  • Default file path:
  • Workspace type: jupyter
  • Opened files (when workspace is loaded): n/a

Solution

Quality Visual Assessment 2 Solution

Date of birth as month/day/year instead of day/month/year isn't a data quality issue because either representation is fine and the entire birthdate column consists of one format.

Accents on given names and surnames (e.g., Tám Liễu) aren't data quality issues because this is how these names are actually spelled. You will have to be aware of these accents whenever performing operations that require this text (merging tables on the name column, for example), but they aren't quality issues.

Weight as a float type (one decimal) and height as an integer (no decimals) is fine. This is just how the data were recorded. Though height may therefore be technically slightly inaccurate (by portions of an inch), this is an example of an acceptable limitation.

More Information